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Fruits are usually used as complementary foods because they contain good 
nutrients such as protein and vitamins. In addition to having good content, it 
turns out that there are potentially harmful microorganisms contained in 
fruits caused by decay. Currently, many artificial intelligence (AD 
techniques have been proposed in research related to fruit freshness. Deep 
learning is one of its most prominent types in similar studies. As deep 
learning typically requires a lot of computation power, it usually consumes a 
lot of electricity. This is an important concern, especially for agribusiness 
companies that require AI implementations. Based on these problems, we 
propose to build a convolutional neural network (CNN) model consisting of 
six layers to detect fruit freshness and save energy. The CNN model we built 
uses electrical power ranging from 55 to 73 Watts during the training 
process and 20 to 27 Watts during the testing process. For accuracy, the 
result is 98.64%. However, compared to previous studies with the 


MobileNetV2 model, our model only excels in several aspects, such as recall 
in fresh banana and fresh oranges, recall and Fl-score in Rotten Banana. 
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1. INTRODUCTION 

Technology and globalization have made significant changes to life on this earth [1]. This can be 
proven by the rampant era of digitization [2] and the fewer resources available at this time [3]. Humans living 
in the current digitalization era are accustomed to an instant lifestyle, including their dietary habits, which 
will impact health [4]. As time goes by, more and more people are aware that one factor affecting their health 
is the nutrition they get from their food [5]. The necessary nutrients can be obtained from consuming foods 
such as meat, fruits, and vegetables. 

Fruits are usually used as complementary food because it has excellent nutritional contents such as 
protein, vitamins, and so on [6]. In addition to having good contents, it turns out that potentially harmful 
microorganisms such as germs and bacteria can be contained in fruits. The microorganisms appear because 
the fruit has undergone a decay process [7]. Currently, there are still many fruit supplier companies, both 
imported and local, that send fruits that are not fit for consumption due to the inaccuracy of the classification 
process carried out by company employees [8] and supermarkets that still sell fruits that are not fit for 
consumption. 
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Based on the problem described earlier, detecting food spoilage, especially fruits, is important, 
starting from production in plantations to consumption. Consequently, many techniques based on computer 
vision and artificial intelligence (AI) have been proposed in the last few decades, as discussed by Moeslund 
and Granum [9]. This technique is proposed because machine learning algorithms are successful in making 
computers that can learn by themselves [10], [11], and an independent learning method called deep learning 
emerged [12], [13] and became the current trend in AI research. 

Deep learning is an algorithm that seeks to learn at different levels of abstraction [14]. The deep 
learning algorithm is inspired by the structure of the human brain, where each neural layer's structured 
nervous system is interconnected and conveys information. This reflects the function of the human brain, 
where every time we receive new knowledge, the brain tries to analyze it with previously known knowledge. 
Deep learning has excellent capabilities for computers in self-learning, from natural language processing to 
image processing [15]. This is because deep learning can process input from raw data, making it superior to 
machine learning [12]. As one of the latest trends in computer science research, especially machine learning 
and AI [16], deep learning is making significant breakthroughs around the world. One of them is digital 
image processing [17]. The type of deep learning most often applied to perform digital image processing is 
convolutional neural network (CNN) [18]. Not only CNN but there are also other types of neural networks, 
such as artificial neural networks (ANN) [19] and recurrent neural networks (RNN) [20]. CNN is generally 
used to perform image processing, face recognition, image classification, and object detection. This is 
because CNN is a type of deep learning that can receive input from images by determining what aspects or 
objects in an image can be studied and then being able to distinguish one image from another. 

Many fields of work are helped by deep learning. For a crucial area of work, such as the health 
sector, deep learning can classify skin cancer [21] and breast cancer [22], which are diseases that cause many 
deaths in the world [23]. In addition to the health sector, there is also the agricultural and plantations sector. 
In the agricultural and plantations sector, deep learning is applied to classify oil palm fruit based on maturity 
[24], [25] and then developed as an automatic oil palm fruit picking machine that can harvest oil palm fruit 
according to the level of maturity [26]. 

Apart from the agricultural and plantation areas already mentioned, there are other areas such as 
detection of fruit freshness. The problem that gave rise to the idea of applying deep learning to detect fruit 
freshness is that conventional spoilage detection techniques are still slow and time-consuming [27]. Based on 
these problems, a method for detecting rotten fruit was developed based on digital image processing with 
machine learning [28]—[30], which has proven to give high potential in the agricultural and plantations 
industry [31]. 

From the high potential generated by the application of machine learning, Karakaya et al. conducted 
a comparative study of machine learning and deep learning feature extraction [27]. The results obtained are 
deep learning provides better accuracy than machine learning [27]. Then further, Chakraborty ef al. 
implemented deep learning called MobileNetV2 to identify rotten fruit [32]. Based on previous research 
related to fruit freshness that has been mentioned, this research still aims to find the best accuracy by 
comparing models. 

Behind the success of deep learning in making computers self-learning has negative impacts, such 
as large electricity consumption. This is a problem for companies that require AI implementations, especially 
agribusiness companies [33]. In 2019, researchers at the University of Massachusetts Amherst estimated that 
training a deep learning model could use 12,041.51 Watts of electrical power and generate up to 626,155 
pounds of carbon dioxide (CO2) emissions [34]. It is also an important concern to create deep learning 
models with less energy consumption. 

Regarding fruit freshness research, no one has aimed to build a deep learning model or compare deep 
learning models that are energy efficient but still provide good accuracy. Research conducted by previous 
researchers still aims to find the best accuracy by comparing models. This is certainly a gap in research related 
to fruit freshness. This proposed study is a continuation of our previous study [35] and is focused on building an 
accurate and energy-efficient CNN model specifically for detecting fruit freshness. This is because the 
application of AI technology must aim to optimize work efficiency in terms of time and cost. 


2. RESEARCH METHOD 

The method proposed in this research consists of data preprocessing and data augmentation, CNN 
model design, hyperparameter tuning, model building, testing, and evaluation. The diagram of the research 
methodology is shown in Figure 1. The dataset used in this study is a publicly accessible dataset called fruits 
fresh and rotten for classification. The dataset contains images of fresh and rotten fruit from a specific class 
consisting of two folders, the train folder, and the test folder. From the two folders, there are 10,901 images 
divided into six classes, namely Freshapples, Freshbanana, Freshoranges, Rottenapple, Rottenbanana, and 
Rottenoranges which can be seen in Figure 2. We split the dataset into training, validation, and testing in our 
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experiment. The training data uses 80% of the data from the train folder, and the validation data uses the rest. 
For test data, use data that comes from the test folder. After the data separation is complete, the next step is to 
enter the data pre-processing and data augmentation stage. 

Data preprocessing is done by resizing the data to a size of 224x224 and data augmentation, 
including geometric transformations (Zoom, width shifting, height shifting, horizontal flip, and shear 
intensity). Data augmentation aims to make the CNN model able to learn and recognize geometrically 
or photometrically transformed data. In most cases, the use of data augmentation has been successful 
in improving the performance of deep learning models. This increase occurred because the model could 
recognize more objects of various types and patterns. Data that has been pre-processed and augmented 
is used to train the designed CNN model and the comparison model, namely residual network 50 
(ResNet-50) [36]. 


Pre-Processing Data 
Dataset Acquisition and Data 
Augmentation 


peers CNN Model Design 
Tuning 

Model Building and Evaluation 
Testing 


Figure 1. Research methodology 


Figure 2. Examples of fruit classes in the datasets 


2.1. Experimental setup and cnn model design and hyperparameter tuning 

This study was conducted on a personal computer with the specifications of Processor Intel® 
Core™ 15-9400F CPU @ 2.90 GHz x 6, 32 GB DDR4 RAM, GPU NVIDIA GeForce RTX 2080Ti 11 GB. 
The device uses the Ubuntu 18.04 64 bits operating system with the python 3.6 programming language and 
the open-source TensorFlow library with Keras deep learning framework. The deep learning method that we 
apply is a six-layer CNN. Our model has MaxPooling2D with pool size (2.2), Conv2D with values 32, 64, 
128, 256, 512, 1024. The hyperparameters we will compare to find the best model are learning rate 0.001 and 
0.0001, batch size 20 and 32, 50 epochs, Adam as an optimizer, and rectified linear units (ReLU) as 
activation function. The CNN model we designed is illustrated in Figure 3. 


Int J Artif Intell, Vol. 12, No. 3, September 2023: 1386-1395 


Int J Artif Intell 


ISSN: 2252-8938 


MaxPooling2D 


Input Conv2D Dropout Dense 
(224x224x3) (2, 2) (256, 3, 3) (0.3) (512) 
Conv2D Dropout MaxPooling2D Conv2D Dropout 
(a2,3, 3) (0.3) (2, 2) (1024, 3, 3) (0.3) 
MaxPooling2D Conv2D Dropout MaxPooling2D Dense 
(2, 2) (128, 3, 3) (0.3) (2, 2) (512) 
a ae | J | 
Dropout MaxPooling2D Conv2D Dropout Dropout 
(0.3) (2, 2) (512, 3, 3) (0.3) (0.3) 
Conv2D Dropout MaxPooling2D Flatten Dense 
(64, 3, 3) (0.3) (2, 2) (6) 


2.2. Model building and testing 

After the CNN model is designed, the CNN model is built, and a training process is carried out with 
the prepared data and by comparing the previously mentioned hyperparameter sets to find the best model 
configuration. The dataset that is trained on our model produces 7,078,726 parameters that can be trained. 
While the ResNet-50 model gets 24,902,534 parameters, but only 10,246,150 parameters can be trained. A 
brief explanation of these parameters is shown in Figure 4. 


Figure 3. CNN model design 


O 1389 


Layer (type) Output Shape Param # 
convad 6 (Conve) ~~ (None; +222, 222, 32) «896 
max_pooling2d 6 (MaxPooling2 (None, 111, 111, 32) c) 
dropout 8 (Dropout) ~~ (None, 111, 111, 32) 0 a 
conv2d_7 (Conv2D) (None, 109, 109, 64) 18496 
max_pooling2d 7 (MaxPooling2 (None, 54, 54, 64) () 
dropout 9 (Dropout) (None, 54, 54, 64) C) 
conv2d 8 (Conv2D) (None, 52, 52, 128) ~ 73856 
max_pooling2d 8 (MaxPooling2 (None, 26, 26, 128) (c) 
dropout_10 (Dropout) (None, 26, 26, 128) (°) 
conv2d 9 (Conv2D) (None, 24, 24, 256) 295168 
max_pooling2d 9 (MaxPooling2 (None, 12, 12, 256) (c) 
dropout_11 (Dropout) (None, 12, 12, 256) (c) 
conv2d_ 10 (Conv2D) (None, 10, 10, 512) 1180160 
max_pooling2d 10 (MaxPooling (None, 5, 5, 512) ( 
dropout_12 (Dropout) (None, 5, 5, 512) 0) Layer (type) Output Shape Param # 
conv2d_11 (Conv2D) (None, 3, 3, 1024) 4719616 model (Model) (None, 2048) 23587712 
max_pooling2d 11 (MaxPooling (None, 1, 1, 1024) () 
dropout_13 (Dropout) (None, 1, 1, 1024) (c) dense (Dense) (None, 512) aeeeee 
flatten_1 (Flatten) (None, 1024) 0) dropout (Dropout) (None, 512) e 
dense 3 (Dense) (None, 512) ~ 524800 dense_1 (pense) (None, 512) SEsEEE 
dropout_14 (Dropout) (None, 512) () 
dense 4 (Dense) (None, 512) 262656 dropout_1 (Dropout) (None, 512) 2 
dropout_15 (Dropout) (None, 512) () dense_2 (Dense) (None, 6) 3078 
, 6) 3078 eS meh ee ee eS ee ee 
sassassss5Sa55555555555555==525============ Total params: 24,902,534 
ae eee Trainable params: 10,246,150 
Non-trainable params: 0 Non-trainable params: 14,656,384 
Our Model ResNet-50 


Figure 4. Summary of the total parameter 


If the data training process has been completed, it will produce an output file with the format (-pb). 


The output file is used in the testing process with the help of the Docker platform. The output file will be 
converted into docker images which can then be accessed using the application programming interface (API). 
The API is included in a testing script written in python. 
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3. RESULTS 
3.1. Training evaluation 

In this experimental scenario, we compare the hyperparameters with the hyperparameter set 
described in subsection 2.1. Then the hyperparameters are used when training the data to find the 
hyperparameters that give the best results for the CNN model that was built. The results of the comparison of 
hyperparameters are shown in Figure 5 and detailed in Table 1. 


Our Model ResNet-50 
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Figure 5. Graphical accuracy and loss of both models 


Table 1. Accuracy and loss scores on our model and ResNet-50 


Configurati Our Model ResNet-50 
on Train Acc Valid Acc Train Loss Valid Loss Train Acc Valid Acc Train Loss Valid Loss 

Bs = 20, Lr 0.9353 0.6521 0.2077 1.0042 0.6647 0.6041 0.8628 1.0383 

= 0.001 (Figure 5(a)) (Figure 5(a)) (Figure 5(a)) (Figure 5(a)) (Figure 5(e)) (Figure 5(e)) (Figure 5(e)) (Figure 5(e)) 
Bs = 20, Lr 0.9806 0.8671 0.0539 0.3453 0.8595 0.7265 0.3815 0.7071 

= 0.0001 (Figure 5(b)) (Figure 5(b)) (Figure 5(b)) (Figure 5(b)) (Figure 5-F) (Figure 5-F) (Figure 5-F) (Figure 5-F) 
Bs = 32, Lr 0.9474 0.7763 0.1720 0.7765 0.7628 0.7566 0.6587 0.6166 

= 0.001 (Figure 5(c)) (Figure 5(c)) (Figure 5(c)) (Figure 5(c)) (Figure 5(g)) (Figure 5(g)) (Figure 5(g)) (Figure 5(g)) 
Bs = 32, Lr 0.9819 0.8950 0.0509 0.3053 0.8624 0.7685 0.3765 0.6057 


= 0.0001 (Figure 5(d)) (Figure 5(d)) (Figure 5(d)) (Figure 5(d)) (Figure 5(h)) (Figure 5(h)) (Figure 5(h)) (Figure 5(h)) 


From the results of the training process, our model with hyperparameter set batch size=32, 
activation function = ReLU, optimizer = Adam and learning rate=0.0001 shows the result is an accuracy of 
0.8950 (89.50%). Meanwhile, ResNet-50 as a comparison model, gets the result with an accuracy of 0.7685 
(76.85%) using the same set of hyperparameters. The performance of each model is illustrated graphically in 
Figure 5. 

When the training process of our model is executed, the GPU memory used is 10,011 MB and 
consumes electricity ranging from 55 to 73 Watts. The ResNet-50 model uses less memory at 9863 MB. 
However, the electricity consumed is greater, which is 59 to 171 Watts. GPU performance when training 
both models is graphically illustrated in Figure 6. 
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Figure 6. GPU performance while training data both models 


Our Model 


ResNet-50 


ResNet-50 is a neural network model with a large architecture. This can be seen from the number of 
parameters described in subsection 2.2. So that when training data with the ResNet-50 model, the GPU load 
becomes larger from 17% to 44%, as illustrated in Figure 6. The GPU also requires a large power 
consumption when processing with a large load. This is why the ResNet-50 model consumes more power 


than our model. 


3.2. Testing evaluation 


The test results are obtained after running a script to predict images written in Python3. Based on the 
test results, our trained model can predict all images in the test folder in 411,226 seconds or about 6 minutes 
51 seconds. Meanwhile, the ResNet-50 trained model takes 456,351 seconds or about 7 minutes 36 seconds. 
The length of time for testing the two models can be seen in Figure 7. 
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After the script to predict the image is complete, it will generate a prediction report in comma 
separated values (CSV) format. The report results will be manually mapped into the confusion matrix. The 
report can be seen in Figure 8 to determine the accuracy, Micro-Precision, Micro-Recall, and Micro-Fl 
values. 


Fresh Fresh Fresh Rotten | Rotten | Rotten 
ie | apples__| banana _ | oranges | apples | banana _| oranges 
Freshapples 384 0 0 9 0 2 
Freshbanana__| 3 366 2 3 1 6 Our Model 
Freshoranges_ | 0 0 337 2 0 49 
Rottenapples | 6 0 0 579 0 16 
Rottenbanana | 1 0 0 1 526 2 
Rottenoranges | 0 0 4 3 0 396 
Fresh Fresh Fresh Rotten | Rotten | Rotten 
Ll apples | banana _ | oranges | apples | banana __| oranges 
Freshapples 268 0 37 89 0 1 
Freshbanana_| 1 328 ll 37 4 0 ResNet-50 
Freshoranges | 2 3 358 19 2 4 
Rottenapples | 39 0 58 471 0 33 
Rottenbanana | 1 2 1 23 499 4 
Rottenoranges | 2 0 yal 71 1 vii 


Figure 8. Confusion matrix of both models 


From the mapped confusion matrix results, our model with the best set of hyperparameters 
described in Subsection 3.1 gets better accuracy per class when compared to the ResNet-50 model with the 
same parameters and Karakaya et al. ResNet-50 [27]. The results of the accuracy comparison for each class 
are presented in Table 2. From Table 2, the model we built got an average accuracy of 98.64%, followed by 
Karakaya et al. ResNet-50 of 97.61%, and the last one is ResNet-50 with the same hyperparameters as our 
model of 93.62%. In addition to comparing with research conducted by Karakaya et al., we also compare 
with research conducted by Chakraborty et al. The evaluation scores for each class are presented in Table 3. 


Table 2. Each class accuracy score 
Our Model ResNet-50 ResNet-50 [27] 


Class 


Accuracy Accuracy Accuracy 

Freshapples 99.22% 93.62% 98.67% 

Freshbanana 99.44% 97.85% 99.33% 

Freshoranges 97.89% 92.29% 96.50% 
Rottenapples 98.52% 86.32% 96.67% 
Rottenbanana 99.81% 98.59% 99.67% 
Rottenoranges 96.96% 93.07% 94.83% 
Average 98.64% 93.62% 97.61% 


Table 3. Each class evaluation score 


Class Our Model ResNet-50 MobileNetV2 [32] 
Preci-sion Recall F1-Score Preci-sion Recall Fl-Score _Preci-sion __ Recall F1-Score 
Freshapples 0.97 0.97 0.97 0.68 0.86 0.76 0.98 0.99 0.97 
Freshbanana 0.96 1.00 0.98 0.86 0.98 0.92 0.99 0.99 0.99 
Freshoranges 0.87 0.98 0.92 0.92 0.67 0.77 0.99 0.98 0.97 
Rottenapples 0.96 0.97 0.97 0.78 0.66 0.72 0.98 0.99 0.98 
Rottenbanana 0.99 1.00 1.00 0.94 0.99 0.96 0.99 0.99 0.99 
Rottenoranges 0.98 0.84 0.91 0.64 0.86 0.73 0.99 0.98 0.98 


From Table 3, our model is only superior in several aspects when compared to the research 
conducted by Chakraborty ef al. with the MobileNetV2 model. This is because the MobileNetV2 model has a 
linear bottleneck feature and shortcut connections between bottlenecks. At the bottleneck, there are inputs 
and outputs between the models, while the inner layer or layer encapsulates the model's ability to convert 
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inputs from lower-level concepts (i.e., pixels) to higher-level descriptors (i.e., image categories). In the end, 
as with the residual connections on CNNs in general, shortcuts between bottlenecks allow for faster training 
and better accuracy [37]. 

When the data test process with our model is performed, the GPU memory used is 356 to 367 MB 
and consumes electricity from 20 to 27 Watts. In contrast to the training process, the ResNet-50 model uses 
more memory than ours, which is 390 to 404 MB. The electricity consumed is even greater, from 
22 to 28 Watts. GPU performance when testing both models is graphically illustrated in Figure 9. 
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Figure 9. GPU performance while testing of both models 


4. CONCLUSION 

We built an energy-efficient and accurate CNN model from the experiments in this study. Based on 
the results of accuracy testing and testing time, our model gives better performance results than ResNet-50 by 
using the same hyperparameters, with 98.64% accuracy, and takes a shorter time to predict all the given 
images, which is 411.226 seconds. In addition to comparing with other models using the same 
hyperparameters, we compared the accuracy results obtained with previous studies. The average accuracy of 
our model is 1.03% higher than the ResNet-50 used used in the previous study. However, our model is only 
superior in several aspects, such as Recall in the Freshbanana and Freshoranges class and Recall and FI- 
Score in the Rottenbanana class when compared to other similar studies. with the MobileNetV2 model. This 
is because the MobileNetV2 model features linear bottlenecks and shortcut connections between bottlenecks. 
This feature allows for faster training and better accuracy. Not only in terms of accuracy and test time, but we 
also evaluated GPU memory usage and GPU power usage during the training and testing process. This is 
necessary for our success in building an energy-efficient CNN model. Even though during the training 
process, it uses more memory by a difference of 142 MB, it uses less power than the ResNet-50 model. The 
power used ranges from 55 to 73 Watts. During the testing process, the ResNet-50 model uses more memory 
with a difference of about 33 to 48 MB and more extensive power of about 2 to 8 Watts. 
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